AITopics | q-value function

Q-Distribution guided Q-learning for offline reinforcement learning: Uncertainty penalized Q-value via consistency model

Neural Information Processing SystemsMar-20-2026, 21:53:05 GMT

As a learning policy may take actions beyond the knowledge of the behavior policy (referred to as Out-of-Distribution (OOD) actions), the Q-values of these OOD actions can be easily overestimated. Consequently, the learning policy becomes biasedly optimized using the incorrect recovered Q-value function. One commonly used idea to avoid the overestimation of Q-value is to make a pessimistic adjustment. Our key idea is to penalize the Q-values of OOD actions that correspond to high uncertainty. In this work, we propose Q-Distribution guided Q-learning (QDQ) which pessimistic Q-value on OOD regions based on uncertainty estimation. The uncertainty measure is based on the conditional Q-value distribution, which is learned via a high-fidelity and efficient consistency model. On the other hand, to avoid the overly conservative problem, we introduce an uncertainty-aware optimization objective to update the Q-value function. The proposed QDQ demonstrates solid theoretical guarantees for the accuracy of Q-value distribution learning and uncertainty measurement, as well as the performance of the learning policy. QDQ consistently exhibits strong performance in the D4RL benchmark and shows significant improvements for many tasks.

artificial intelligence, machine learning, reinforcement learning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.76)

Add feedback

4b121e627d3c5683f312ad168988f3f0-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-19-2026, 02:43:33 GMT

A.2 MainProofsketch In this section we will give a theoretical guarantee for the performance of our algorithm. Essentially, it measures the largest total difference of value estimation among all the functions in f Ft for the fixed inputsxt,i wherei [M]. Lemma 2. If (βt 0 | t N) is a nondecreasing sequence and Ft:=n Themainstructure ofthisproof issimilar toproposition 3,section CinEluder dimension's paper, and we will only point out the subtle details that makes the difference. Apart from the notations section 3, we add more symbols for the regret analysis. Next, we will show thatf h is a feasible solution for the optimization ofFt.

artificial intelligence, def, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

Provably and Practically Efficient Adversarial Imitation Learning with General Function Approximation

Neural Information Processing SystemsFeb-16-2026, 00:20:20 GMT

As a prominent category of imitation learning methods, adversarial imitation learning (AIL) has garnered significant practical success powered by neural network approximation. However, existing theoretical studies on AIL are primarily limited to simplified scenarios such as tabular and linear function approximation and involve complex algorithmic designs that hinder practical implementation, highlighting a gap between theory and practice.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Education > Educational Setting (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.62)

Add feedback

8e806d3c56ed5f1dab85d601e13cbe38-Paper-Conference.pdf

Neural Information Processing SystemsFeb-15-2026, 20:09:44 GMT

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > California (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning

Wenjie Shi, Shiji Song, Hui Wu, Ya-Chu Hsu, Cheng Wu, Gao Huang

Neural Information Processing SystemsFeb-13-2026, 20:02:50 GMT

Model-free deepreinforcement learning (RL)algorithms havebeenwidely used for a range of complex control tasks.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > China > Beijing > Beijing (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Boolean Task Algebra For Reinforcement Learning

Neural Information Processing SystemsFeb-8-2026, 19:02:12 GMT

A major challenge is thus in designing sample-efficient agents that can transfer their existing knowledge to solve new tasks quickly.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
North America > Canada (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)
Africa > South Africa > Gauteng > Johannesburg (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Add feedback

EpisodicMulti-agentReinforcementLearningwith Curiosity-drivenExploration

Neural Information Processing SystemsFeb-7-2026, 18:44:15 GMT

Efficient exploration in deep cooperativemulti-agent reinforcement learning (MARL) still remains challenging in complex coordination problems.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > China (0.05)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

61caa89f7a5366023db6f5736b2c579d-Paper-Conference.pdf

Neural Information Processing SystemsNov-19-2025, 03:46:41 GMT

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)

Add feedback

Provably and Practically Efficient Adversarial Imitation Learning with General Function Approximation

Neural Information Processing SystemsOct-10-2025, 06:48:11 GMT

As a prominent category of imitation learning methods, adversarial imitation learning (AIL) has garnered significant practical success powered by neural network approximation. However, existing theoretical studies on AIL are primarily limited to simplified scenarios such as tabular and linear function approximation and involve complex algorithmic designs that hinder practical implementation, highlighting a gap between theory and practice.

function approximation, general function approximation, opt -ail, (13 more...)

Neural Information Processing Systems

Country: